library(topGO)
library(knitr)   # to use kable()
library(limma)   # to use vennDiagram()
library(ggplot2)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
                   isoforms = '../2019-07-26/transcripts/annotation.txt')
TAGDIR <- paste0('../2020-01-08/', TAG, '/')

Introduction

This is the enrichment analysis for genes regulated by regime. Because I quantified the association of expression with regime in two different ways, this document has two sections. Section Variance reports the enrichment analysis performed with genes ordered by the proportion of expression variance explained by regime. Note that some genes with low variance may still be highly associated with regime, even if the fold change between levels of this factor is low. Section Differential expression uses an ordering of genes based on the significance of the differential expression between levels of regime, which does depend on fold change.

Reading the data

Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.

tagVariance <- read.table(paste0(TAGDIR, VAR, '_variance.txt'))
tagPValue   <- read.table(paste0(TAGDIR, VAR, '_pvalue.txt'))
annotation  <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))

To initialize the topGOdata object, I will need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, either the portion of variance explained by regime or the p-values of the differential expression test. The structure() function adds attributes to an object.

Variance <- structure(tagVariance[,1], names = row.names(tagVariance))
PValues  <- structure(tagPValue[,1],   names = row.names(tagPValue))
rm(tagVariance, tagPValue)

There are 18598 genes scored with a variance portion and a differential expression p-value. It should actually be the exact same genes. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.

head(annotation)
##       tagname                                                           goterms
## 1 XLOC_000002                                  GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009                                  GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010                       GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021                       GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036                                  GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)

There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.

for (tag in unique(c(names(PValues), names(Variance)))) {
   if (! tag %in% names(allgenes2GO)) {
      allgenes2GO <- append(allgenes2GO,
         structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
   }
}

Using differential expression p-values

Building the topGO object

Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.

I generate three datasets, to analyse each of the three ontologies.

selection <- function(allScores) {return(allScores < 0.05)}
GOdata.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
   Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.

BP.elim     <- runTest(GOdata.BP, algorithm = "elim",     statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea      <- runTest(GOdata.BP, algorithm = "lea",      statistic = "ks")
MF.elim     <- runTest(GOdata.MF, algorithm = "elim",     statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea      <- runTest(GOdata.MF, algorithm = "lea",      statistic = "ks")
CC.elim     <- runTest(GOdata.CC, algorithm = "elim",     statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea      <- runTest(GOdata.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
   Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 35
BP weight01 1154 23
BP lea 1154 70
MF elim 576 38
MF weight01 576 30
MF lea 576 77
CC elim 291 18
CC weight01 291 10
CC lea 291 35
rm(ResultsSummary)

Results

The topGO package does not pay much attention to what terms are significant because they are overrepresented and which ones are underrepresented among the most differentially expressed genes. I think it’s worth separating them, to facilitate the biological interpretation. Note that not all terms listed in the tables below are significant. The scores for the three methods (elim, weight01 and lea) are non-corrected p-values.

Biological process

orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.pvalue.sigTerms <- sigTerms

BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BP.all[BP.all$Significant > BP.all$Expected,],
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0055085 transmembrane transport 500 262 185.89 1 1.1e-12 4.0e-12 4.2e-16
2 GO:0006508 proteolysis 442 220 164.33 2 2.3e-09 4.1e-08 5.9e-11
3 GO:0006468 protein phosphorylation 276 133 102.61 3 2.0e-06 1.7e-06 2.0e-06
4 GO:0055114 oxidation-reduction process 376 176 139.79 4 2.3e-05 1.4e-05 2.3e-05
5 GO:0006289 nucleotide-excision repair 13 11 4.83 8 0.00047 0.00047 0.00047
6 GO:0007186 G protein-coupled receptor signaling pat… 206 93 76.59 10 0.00092 0.00104 0.00092
7 GO:0007160 cell-matrix adhesion 16 11 5.95 13 0.00116 0.00116 0.00116
8 GO:0007165 signal transduction 544 233 202.25 30 0.00762 0.00136 5.6e-05
9 GO:0006813 potassium ion transport 56 28 20.82 6 0.00039 0.00137 0.00039
10 GO:0006355 regulation of transcription, DNA-templat… 355 158 131.98 5 0.00016 0.00154 0.00016
11 GO:0006470 protein dephosphorylation 58 31 21.56 7 0.00045 0.00173 0.00045
12 GO:0034220 ion transmembrane transport 147 74 54.65 11 0.00096 0.00205 0.00096
13 GO:0005992 trehalose biosynthetic process 6 5 2.23 17 0.00330 0.00330 0.00330
14 GO:1901642 nucleoside transmembrane transport 7 6 2.60 18 0.00360 0.00360 0.00360
15 GO:0006979 response to oxidative stress 26 14 9.67 19 0.00372 0.00372 0.00372
16 GO:0044271 cellular nitrogen compound biosynthetic … 783 300 291.11 891 0.73551 0.00415 0.84980
17 GO:0043161 proteasome-mediated ubiquitin-dependent … 45 23 16.73 23 0.00533 0.00533 0.00533
18 GO:0048870 cell motility 14 8 5.20 189 0.06783 0.00697 0.06783
20 GO:0003341 cilium movement 10 8 3.72 31 0.00809 0.00809 0.00809
21 GO:0060271 cilium assembly 46 28 17.10 9 0.00058 0.00831 0.00058
22 GO:0051726 regulation of cell cycle 43 19 15.99 415 0.22696 0.00915 0.36822
23 GO:0046942 carboxylic acid transport 20 14 7.44 24 0.00537 0.00937 0.00537
24 GO:0006367 transcription initiation from RNA polyme… 13 9 4.83 36 0.01041 0.01041 0.01041
25 GO:0007017 microtubule-based process 195 97 72.50 34 0.00889 0.01181 0.00149
26 GO:0008272 sulfate transport 13 8 4.83 41 0.01226 0.01226 0.01226
27 GO:0007156 homophilic cell adhesion via plasma memb… 21 10 7.81 43 0.01239 0.01239 0.01239
28 GO:0006383 transcription by RNA polymerase III 8 7 2.97 47 0.01335 0.01335 0.01335
29 GO:0043628 ncRNA 3’-end processing 6 4 2.23 48 0.01402 0.01402 0.01402
30 GO:0016998 cell wall macromolecule catabolic proces… 9 6 3.35 49 0.01435 0.01435 0.01435
31 GO:0006032 chitin catabolic process 9 6 3.35 50 0.01435 0.01435 0.01435
32 GO:0006241 CTP biosynthetic process 5 5 1.86 55 0.01470 0.01470 0.01470
33 GO:0046039 GTP metabolic process 5 5 1.86 56 0.01470 0.01470 0.01470
34 GO:0046129 purine ribonucleoside biosynthetic proce… 5 5 1.86 57 0.01470 0.01470 0.01470
35 GO:0006821 chloride transport 8 6 2.97 69 0.01633 0.01633 0.01633
kable(
   BP.all[BP.all$Significant < BP.all$Expected,],
   caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
19 GO:1901566 organonitrogen compound biosynthetic pro… 432 139 160.61 678 0.49516 0.00754 0.84012
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
                 Definition = Definition(GOTERM[sigTerms]),
                 PValue=score(BP.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with',
                      VAR, 'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0055085 transmembrane transport The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. 0.0000000
GO:0006508 proteolysis The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. 0.0000000
GO:0006468 protein phosphorylation The process of introducing a phosphate group on to a protein. 0.0000017
GO:0055114 oxidation-reduction process A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. 0.0000145
GO:0006289 nucleotide-excision repair A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). 0.0004708
GO:0007186 G protein-coupled receptor signaling pathway A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). 0.0010380
GO:0007160 cell-matrix adhesion The binding of a cell to the extracellular matrix via adhesion molecules. 0.0011588
GO:0007165 signal transduction The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal (e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light), or for signal transduction in the absence of ligand, signal-withdrawal or the activity of a constitutively active receptor. Signal transduction ends with regulation of a downstream cellular process, e.g. regulation of transcription or regulation of a metabolic process. Signal transduction covers signaling from receptors located on the surface of the cell and signaling via molecules located within the cell. For signaling between cells, signal transduction is restricted to events at and within the receiving cell. 0.0013602
GO:0006813 potassium ion transport The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0013703
GO:0006355 regulation of transcription, DNA-templated Any process that modulates the frequency, rate or extent of cellular DNA-templated transcription. 0.0015381
GO:0006470 protein dephosphorylation The process of removing one or more phosphoric residues from a protein. 0.0017294
GO:0034220 ion transmembrane transport A process in which an ion is transported across a membrane. 0.0020476
GO:0005992 trehalose biosynthetic process The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. 0.0033010
GO:1901642 nucleoside transmembrane transport NA 0.0036043
GO:0006979 response to oxidative stress Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of oxidative stress, a state often resulting from exposure to high levels of reactive oxygen species, e.g. superoxide anions, hydrogen peroxide (H2O2), and hydroxyl radicals. 0.0037189
GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, and mediated by the proteasome. 0.0053345
GO:0003341 cilium movement The directed, self-propelled movement of a cilium. 0.0080902
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0083101
GO:0046942 carboxylic acid transport The directed movement of carboxylic acids into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. Carboxylic acids are organic acids containing one or more carboxyl (COOH) groups or anions (COO-). 0.0093728

I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.

showSigOfNodes(GOdata.BP, score(BP.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 229 
## Number of Edges = 439 
## 
## $complete.dag
## [1] "A graph with 229 nodes."

This is just a example of the most significant GO term:

showGroupDensity(GOdata.BP, names(score(BP.weight01))[which.min(score(BP.weight01))])

Molecular function

orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.pvalue.sigTerms <- sigTerms

MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   MF.all[MF.all$Significant > MF.all$Expected,],
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0005515 protein binding 2101 934 796.69 1 8.0e-14 1.2e-15 5.5e-16
GO:0005509 calcium ion binding 361 200 136.89 2 9.8e-12 9.8e-12 9.8e-12
GO:0005525 GTP binding 198 96 75.08 4 6.1e-05 6.1e-05 6.1e-05
GO:0004672 protein kinase activity 274 128 103.90 3 3.4e-05 7.9e-05 3.4e-05
GO:0003774 motor activity 131 70 49.67 6 0.00028 0.00012 0.00028
GO:0005524 ATP binding 786 335 298.05 5 0.00023 0.00023 0.00023
GO:0022857 transmembrane transporter activity 583 286 221.07 48 0.01338 0.00029 2.3e-09
GO:0004181 metallocarboxypeptidase activity 19 14 7.20 8 0.00033 0.00033 0.00033
GO:0005507 copper ion binding 18 13 6.83 9 0.00043 0.00043 0.00043
GO:0005200 structural constituent of cytoskeleton 17 11 6.45 10 0.00054 0.00054 0.00054
GO:0016491 oxidoreductase activity 417 189 158.12 53 0.01737 0.00060 0.00347
GO:0004252 serine-type endopeptidase activity 88 43 33.37 11 0.00087 0.00087 0.00087
GO:0005230 extracellular ligand-gated ion channel a… 54 29 20.48 38 0.00987 0.00106 0.00987
GO:0061630 ubiquitin protein ligase activity 27 17 10.24 14 0.00114 0.00114 0.00114
GO:0008138 protein tyrosine/serine/threonine phosph… 23 16 8.72 18 0.00195 0.00195 0.00195
GO:0003924 GTPase activity 138 68 52.33 20 0.00245 0.00245 0.00245
GO:0004198 calcium-dependent cysteine-type endopept… 28 17 10.62 21 0.00250 0.00250 0.00250
GO:0004601 peroxidase activity 34 19 12.89 39 0.01148 0.00296 0.01148
GO:0042626 ATPase activity, coupled to transmembran… 86 49 32.61 13 0.00093 0.00354 0.00093
GO:0005337 nucleoside transmembrane transporter act… 7 6 2.65 24 0.00386 0.00386 0.00386
GO:0004222 metalloendopeptidase activity 73 41 27.68 25 0.00446 0.00446 0.00446
GO:0016787 hydrolase activity 1382 655 524.05 22 0.00253 0.00474 4.2e-17
GO:0016715 oxidoreductase activity, acting on paire… 14 10 5.31 26 0.00539 0.00539 0.00539
GO:0016840 carbon-nitrogen lyase activity 8 7 3.03 28 0.00577 0.00577 0.00577
GO:0008017 microtubule binding 62 35 23.51 29 0.00652 0.00652 0.00652
GO:0005506 iron ion binding 72 40 27.30 16 0.00125 0.00663 0.00125
GO:0030248 cellulose binding 7 6 2.65 30 0.00666 0.00666 0.00666
GO:0005249 voltage-gated potassium channel activity 27 12 10.24 32 0.00749 0.00749 0.00749
GO:0004930 G protein-coupled receptor activity 180 84 68.26 17 0.00179 0.00765 0.00179
GO:0047631 ADP-ribose diphosphatase activity 6 5 2.28 34 0.00809 0.00809 0.00809
GO:0030246 carbohydrate binding 51 31 19.34 40 0.01189 0.01012 0.00082
GO:0016791 phosphatase activity 100 53 37.92 65 0.02349 0.01012 0.00079
GO:0008092 cytoskeletal protein binding 144 76 54.60 37 0.00914 0.01042 0.00031
GO:0008271 secondary active sulfate transmembrane t… 13 8 4.93 45 0.01317 0.01317 0.01317
GO:0020037 heme binding 95 49 36.02 47 0.01331 0.01331 0.01331
GO:0043565 sequence-specific DNA binding 146 61 55.36 91 0.04456 0.01433 0.04456
GO:0004568 chitinase activity 9 6 3.41 50 0.01471 0.01471 0.01471
GO:0004550 nucleoside diphosphate kinase activity 5 5 1.90 51 0.01607 0.01607 0.01607
kable(
   MF.all[MF.all$Significant < MF.all$Expected,],
   caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MF.weight01)[sigTerms]),
      caption = paste('Molecular function terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular function terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000000
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0000000
GO:0005525 GTP binding Interacting selectively and non-covalently with GTP, guanosine triphosphate. 0.0000612
GO:0004672 protein kinase activity Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. 0.0000787
GO:0003774 motor activity Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. 0.0001188
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0002254
GO:0004181 metallocarboxypeptidase activity Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0003290
GO:0005507 copper ion binding Interacting selectively and non-covalently with copper (Cu) ions. 0.0004350
GO:0005200 structural constituent of cytoskeleton The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. 0.0005415
GO:0004252 serine-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). 0.0008730
GO:0005230 extracellular ligand-gated ion channel activity Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. 0.0010573
GO:0061630 ubiquitin protein ligase activity Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. 0.0011377
GO:0008138 protein tyrosine/serine/threonine phosphatase activity Catalysis of the reactions: protein serine + H2O = protein serine + phosphate; protein threonine phosphate + H2O = protein threonine + phosphate; and protein tyrosine phosphate + H2O = protein tyrosine + phosphate. 0.0019535
GO:0003924 GTPase activity Catalysis of the reaction: GTP + H2O = GDP + phosphate. 0.0024524
GO:0004198 calcium-dependent cysteine-type endopeptidase activity Catalysis of the hydrolysis of nonterminal peptide bonds in a polypeptide chain by a mechanism using a cysteine residue at the enzyme active center, and requiring the presence of calcium. 0.0024962
GO:0042626 ATPase activity, coupled to transmembrane movement of substances Catalysis of the reaction: ATP + H2O = ADP + phosphate, to directly drive the active transport of a substance across a membrane. 0.0035393
GO:0005337 nucleoside transmembrane transporter activity Enables the transfer of a nucleoside, a nucleobase linked to either beta-D-ribofuranose (ribonucleoside) or 2-deoxy-beta-D-ribofuranose, (a deoxyribonucleotide) from one side of a membrane to the other. 0.0038616
GO:0004222 metalloendopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0044621
GO:0016787 hydrolase activity Catalysis of the hydrolysis of various bonds, e.g. C-O, C-N, C-C, phosphoric anhydride bonds, etc. Hydrolase is the systematic name for any enzyme of EC class 3. 0.0047444
GO:0016715 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. 0.0053940
GO:0016840 carbon-nitrogen lyase activity Catalysis of the release of ammonia or one of its derivatives, with the formation of a double bond or ring. Enzymes with this activity may catalyze the actual elimination of the ammonia, amine or amide, e.g. CH-CH(-NH-R) = C=CH- + NH2-R. Others, however, catalyze elimination of another component, e.g. water, which is followed by spontaneous reactions that lead to breakage of the C-N bond, e.g. L-serine ammonia-lyase (EC:4.3.1.17), so that the overall reaction is C(-OH)-CH(-NH2) = CH2-CO- + NH3, i.e. an elimination with rearrangement. The sub-subclasses of EC:4.3 are the ammonia-lyases (EC:4.3.1), lyases acting on amides, amidines, etc. (EC:4.3.2), the amine-lyases (EC:4.3.3), and other carbon-nitrogen lyases (EC:4.3.99). 0.0057664
GO:0008017 microtubule binding Interacting selectively and non-covalently with microtubules, filaments composed of tubulin monomers. 0.0065179
GO:0005506 iron ion binding Interacting selectively and non-covalently with iron (Fe) ions. 0.0066340
GO:0030248 cellulose binding Interacting selectively and non-covalently with cellulose. 0.0066580
GO:0005249 voltage-gated potassium channel activity Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. 0.0074909
GO:0004930 G protein-coupled receptor activity Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0076490
GO:0047631 ADP-ribose diphosphatase activity Catalysis of the reaction: ADP-ribose + H2O = AMP + D-ribose 5-phosphate. 0.0080946
showSigOfNodes(GOdata.MF, score(MF.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 145 
## Number of Edges = 195 
## 
## $complete.dag
## [1] "A graph with 145 nodes."
showGroupDensity(GOdata.MF, names(score(MF.weight01))[which.min(score(MF.weight01))])

Cellular component

orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.pvalue.sigTerms <- sigTerms

CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   CC.all[CC.all$Significant > CC.all$Expected,],
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0016021 integral component of membrane 885 427 316.73 1 1.2e-16 3.5e-15 1.3e-22
GO:0016020 membrane 1565 714 560.09 3 9.0e-06 5.7e-06 1.9e-26
GO:0005887 integral component of plasma membrane 118 61 42.23 2 8.0e-06 2.6e-05 6.8e-08
GO:0016459 myosin complex 33 23 11.81 4 4.4e-05 4.4e-05 4.4e-05
GO:0098800 inner mitochondrial membrane protein com… 21 14 7.52 11 0.00195 0.00017 0.00195
GO:0008076 voltage-gated potassium channel complex 13 8 4.65 6 0.00077 0.00077 0.00077
GO:0090575 RNA polymerase II transcription factor c… 23 17 8.23 8 0.00085 0.00100 0.00085
GO:0005576 extracellular region 140 65 50.10 10 0.00191 0.00263 0.00191
GO:0005886 plasma membrane 170 88 60.84 5 0.00064 0.00413 5.0e-11
GO:0005634 nucleus 531 216 190.04 14 0.00711 0.00938 0.03353
GO:0042555 MCM complex 10 6 3.58 19 0.01007 0.01007 0.01007
GO:0005743 mitochondrial inner membrane 33 22 11.81 20 0.01131 0.01131 0.00020
GO:0098796 membrane protein complex 145 71 51.89 37 0.02782 0.01285 0.00525
GO:0034704 calcium channel complex 5 5 1.79 24 0.01552 0.01552 0.01552
GO:0044430 cytoskeletal part 152 73 54.40 47 0.03285 0.01625 0.01538
GO:0005929 cilium 27 18 9.66 21 0.01151 0.01627 0.00174
GO:0016591 RNA polymerase II, holoenzyme 20 14 7.16 9 0.00172 0.01827 0.00172
GO:0005681 spliceosomal complex 12 6 4.29 27 0.01870 0.01870 0.01870
kable(
   CC.all[CC.all$Significant < CC.all$Expected,],
   caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CC.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0016021 integral component of membrane The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000000
GO:0016020 membrane A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. 0.0000057
GO:0005887 integral component of plasma membrane The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000258
GO:0016459 myosin complex A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. 0.0000444
GO:0098800 inner mitochondrial membrane protein complex Any protein complex that is part of the inner mitochondrial membrane. 0.0001717
GO:0008076 voltage-gated potassium channel complex A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. 0.0007746
GO:0090575 RNA polymerase II transcription factor complex A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. 0.0010016
GO:0005576 extracellular region The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. 0.0026307
GO:0005886 plasma membrane The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. 0.0041281
showSigOfNodes(GOdata.CC, score(CC.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 75 
## Number of Edges = 132 
## 
## $complete.dag
## [1] "A graph with 75 nodes."
showGroupDensity(GOdata.CC, names(score(CC.weight01))[which.min(score(CC.weight01))])

Using the portion of variance explained by regime

Building the topGO object

I need to generate the topGO objects again, using the alternative gene ordering, based on the proportion of expression-level variance explained by regime. I miss a way to inform the topGOdata object that the score in this case is reversed, with respect to p-values: the higher it is, the more differentially expressed the gene is. To make sure that GO terms are tested in the same way than when using p-values, I will just reverse the proportion of variance explained by regime to its complement. Taking this into account, the subset of interesting genes (selection() function) must be defined as the lowest 10% scores, which are the 10% genes with largest portion of variance explained by regime.

selection <- function(allScores) {return(allScores <= quantile(allScores, probs = 0.10))}
GOdataVar.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdataVar.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdataVar.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

library(knitr)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), numGenes),
   Num_GO_terms = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

BPvar.elim     <- runTest(GOdataVar.BP, algorithm = "elim",     statistic = "ks")
BPvar.weight01 <- runTest(GOdataVar.BP, algorithm = "weight01", statistic = "ks")
BPvar.lea      <- runTest(GOdataVar.BP, algorithm = "lea",      statistic = "ks")
MFvar.elim     <- runTest(GOdataVar.MF, algorithm = "elim",     statistic = "ks")
MFvar.weight01 <- runTest(GOdataVar.MF, algorithm = "weight01", statistic = "ks")
MFvar.lea      <- runTest(GOdataVar.MF, algorithm = "lea",      statistic = "ks")
CCvar.elim     <- runTest(GOdataVar.CC, algorithm = "elim",     statistic = "ks")
CCvar.weight01 <- runTest(GOdataVar.CC, algorithm = "weight01", statistic = "ks")
CCvar.lea      <- runTest(GOdataVar.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) length(score(X))),
   Significant = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 55
BP weight01 1154 34
BP lea 1154 102
MF elim 576 53
MF weight01 576 39
MF lea 576 100
CC elim 291 29
CC weight01 291 16
CC lea 291 57
rm(ResultsSummary)

Results

Biological process

orderedTerms <- names(sort(score(BPvar.weight01)))
significant.weight01 <- score(BPvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BPvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BPvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.variance.sigTerms <- sigTerms

BPvar.all <- GenTable(GOdataVar.BP, elim=BPvar.elim, weight01=BPvar.weight01, lea=BPvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BPvar.all[BPvar.all$Significant > BPvar.all$Expected,],
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0055085 transmembrane transport 500 90 47.85 1 2.1e-12 3.1e-12 1.8e-16
2 GO:0006508 proteolysis 442 74 42.30 2 4.9e-09 2.7e-09 2.0e-12
3 GO:0007186 G protein-coupled receptor signaling pat… 206 27 19.71 4 1.2e-06 1.1e-06 1.2e-06
4 GO:0006355 regulation of transcription, DNA-templat… 355 43 33.97 3 3.7e-07 6.9e-06 3.7e-07
5 GO:0034220 ion transmembrane transport 147 21 14.07 7 6.9e-05 1.1e-05 6.9e-05
6 GO:0006468 protein phosphorylation 276 39 26.41 6 1.1e-05 3.8e-05 1.1e-05
7 GO:0006813 potassium ion transport 56 9 5.36 5 6.7e-06 0.00019 6.7e-06
8 GO:0006511 ubiquitin-dependent protein catabolic pr… 102 12 9.76 8 0.00019 0.00019 2.3e-05
9 GO:0006289 nucleotide-excision repair 13 5 1.24 9 0.00021 0.00021 0.00021
10 GO:0055114 oxidation-reduction process 376 47 35.98 15 0.00094 0.00028 0.00094
11 GO:0060271 cilium assembly 46 7 4.40 11 0.00054 0.00054 1.0e-05
12 GO:0003341 cilium movement 10 2 0.96 13 0.00070 0.00070 0.00070
13 GO:0005992 trehalose biosynthetic process 6 5 0.57 14 0.00074 0.00074 0.00074
15 GO:0007165 signal transduction 544 61 52.06 23 0.00350 0.00102 5.4e-08
17 GO:0006030 chitin metabolic process 74 13 7.08 10 0.00033 0.00165 0.00033
18 GO:0048870 cell motility 14 5 1.34 67 0.01357 0.00181 0.01357
19 GO:0006367 transcription initiation from RNA polyme… 13 5 1.24 20 0.00273 0.00273 0.00273
20 GO:0051260 protein homooligomerization 25 4 2.39 22 0.00301 0.00301 0.00301
21 GO:0007018 microtubule-based movement 115 13 11.01 16 0.00113 0.00358 2.2e-05
22 GO:0007160 cell-matrix adhesion 16 7 1.53 24 0.00366 0.00366 0.00366
25 GO:0016579 protein deubiquitination 40 6 3.83 31 0.00400 0.00400 0.00400
26 GO:0046039 GTP metabolic process 5 1 0.48 33 0.00417 0.00417 0.00417
27 GO:0046129 purine ribonucleoside biosynthetic proce… 5 1 0.48 34 0.00417 0.00417 0.00417
28 GO:1901642 nucleoside transmembrane transport 7 2 0.67 35 0.00418 0.00418 0.00418
30 GO:0006744 ubiquinone biosynthetic process 5 1 0.48 41 0.00564 0.00564 0.00564
31 GO:0071805 potassium ion transmembrane transport 19 6 1.82 58 0.01074 0.00590 0.01074
32 GO:0006520 cellular amino acid metabolic process 117 15 11.20 862 0.68061 0.00819 0.95060
33 GO:0006241 CTP biosynthetic process 5 1 0.48 48 0.00881 0.00881 0.00881
34 GO:0016539 intein-mediated protein splicing 5 1 0.48 55 0.00964 0.00964 0.00964
35 GO:0005975 carbohydrate metabolic process 202 35 19.33 80 0.01684 0.01164 0.04834
36 GO:0006470 protein dephosphorylation 58 7 5.55 17 0.00131 0.01234 0.00131
37 GO:0006165 nucleoside diphosphate phosphorylation 16 3 1.53 389 0.21287 0.01247 0.21287
38 GO:0009206 purine ribonucleoside triphosphate biosy… 27 4 2.58 499 0.29722 0.01248 0.29722
39 GO:0000245 spliceosomal complex assembly 7 1 0.67 69 0.01366 0.01366 0.01366
40 GO:0008285 negative regulation of cell proliferatio… 6 2 0.57 70 0.01381 0.01381 0.01381
43 GO:0042073 intraciliary transport 11 2 1.05 294 0.14469 0.01624 0.14469
44 GO:0042398 cellular modified amino acid biosyntheti… 17 3 1.63 638 0.42953 0.01723 0.42953
45 GO:0007017 microtubule-based process 195 26 18.66 195 0.07365 0.01768 0.03148
46 GO:0044237 cellular metabolic process 2395 244 229.20 623 0.40930 0.01828 1.3e-17
47 GO:0006689 ganglioside catabolic process 5 2 0.48 87 0.01839 0.01839 0.01839
48 GO:0006383 transcription by RNA polymerase III 8 1 0.77 95 0.01855 0.01855 0.01855
49 GO:0009152 purine ribonucleotide biosynthetic proce… 44 5 4.21 537 0.32672 0.01929 0.32672
50 GO:0006914 autophagy 21 3 2.01 30 0.00395 0.01940 0.00395
51 GO:0006626 protein targeting to mitochondrion 9 1 0.86 83 0.01770 0.02082 0.01770
52 GO:0006821 chloride transport 8 1 0.77 101 0.02154 0.02154 0.02154
53 GO:0030163 protein catabolic process 123 12 11.77 21 0.00298 0.02225 1.8e-07
55 GO:0006979 response to oxidative stress 26 8 2.49 111 0.02276 0.02276 0.02276
kable(
   BPvar.all[BPvar.all$Significant < BPvar.all$Expected,],
   caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
14 GO:0006412 translation 195 3 18.66 383 0.20869 0.00087 0.20869
16 GO:0000398 mRNA splicing, via spliceosome 46 1 4.40 12 0.00070 0.00129 0.00070
23 GO:0007156 homophilic cell adhesion via plasma memb… 21 1 2.01 26 0.00375 0.00375 0.00375
24 GO:0043161 proteasome-mediated ubiquitin-dependent … 45 4 4.31 29 0.00388 0.00388 0.00388
29 GO:0044271 cellular nitrogen compound biosynthetic … 783 73 74.93 215 0.08360 0.00453 0.22562
41 GO:0035556 intracellular signal transduction 173 16 16.56 125 0.03154 0.01456 0.25702
42 GO:0051726 regulation of cell cycle 43 4 4.12 270 0.12315 0.01487 0.12315
54 GO:0070836 caveola assembly 7 0 0.67 103 0.02251 0.02251 0.02251
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(BPvar.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0055085 transmembrane transport The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. 0.0000000
GO:0006508 proteolysis The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. 0.0000000
GO:0007186 G protein-coupled receptor signaling pathway A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). 0.0000011
GO:0006355 regulation of transcription, DNA-templated Any process that modulates the frequency, rate or extent of cellular DNA-templated transcription. 0.0000069
GO:0034220 ion transmembrane transport A process in which an ion is transported across a membrane. 0.0000114
GO:0006468 protein phosphorylation The process of introducing a phosphate group on to a protein. 0.0000380
GO:0006813 potassium ion transport The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0001859
GO:0006511 ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of a ubiquitin group, or multiple ubiquitin groups, to the protein. 0.0001901
GO:0006289 nucleotide-excision repair A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). 0.0002075
GO:0055114 oxidation-reduction process A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. 0.0002837
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0005423
GO:0003341 cilium movement The directed, self-propelled movement of a cilium. 0.0007025
GO:0005992 trehalose biosynthetic process The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. 0.0007430
GO:0007165 signal transduction The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal (e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light), or for signal transduction in the absence of ligand, signal-withdrawal or the activity of a constitutively active receptor. Signal transduction ends with regulation of a downstream cellular process, e.g. regulation of transcription or regulation of a metabolic process. Signal transduction covers signaling from receptors located on the surface of the cell and signaling via molecules located within the cell. For signaling between cells, signal transduction is restricted to events at and within the receiving cell. 0.0010235
GO:0000398 mRNA splicing, via spliceosome The joining together of exons from one or more primary transcripts of messenger RNA (mRNA) and the excision of intron sequences, via a spliceosomal mechanism, so that mRNA consisting only of the joined exons is produced. 0.0012854
GO:0006030 chitin metabolic process The chemical reactions and pathways involving chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0016545
GO:0006367 transcription initiation from RNA polymerase II promoter Any process involved in the assembly of the RNA polymerase II preinitiation complex (PIC) at an RNA polymerase II promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. Promoter clearance, or release, is the transition between the initiation and elongation phases of transcription. 0.0027295
GO:0051260 protein homooligomerization The process of creating protein oligomers, compounds composed of a small number, usually between three and ten, of identical component monomers. Oligomers may be formed by the polymerization of a number of monomers or the depolymerization of a large protein polymer. 0.0030076
GO:0007018 microtubule-based movement A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. 0.0035823
GO:0007160 cell-matrix adhesion The binding of a cell to the extracellular matrix via adhesion molecules. 0.0036621
GO:0007156 homophilic cell adhesion via plasma membrane adhesion molecules The attachment of a plasma membrane adhesion molecule in one cell to an identical molecule in an adjacent cell. 0.0037547
GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, and mediated by the proteasome. 0.0038846
GO:0016579 protein deubiquitination The removal of one or more ubiquitin groups from a protein. 0.0040033
GO:0046039 GTP metabolic process The chemical reactions and pathways involving GTP, guanosine triphosphate. 0.0041658
GO:0046129 purine ribonucleoside biosynthetic process The chemical reactions and pathways resulting in the formation of any purine ribonucleoside, a nucleoside in which purine base is linked to a ribose (beta-D-ribofuranose) molecule. 0.0041658
GO:1901642 nucleoside transmembrane transport NA 0.0041829
GO:0006744 ubiquinone biosynthetic process The chemical reactions and pathways resulting in the formation of ubiquinone, a lipid-soluble electron-transporting coenzyme. 0.0056367
GO:0006241 CTP biosynthetic process The chemical reactions and pathways resulting in the formation of CTP, cytidine 5’-triphosphate. 0.0088100
GO:0016539 intein-mediated protein splicing The removal of an internal amino acid sequence (an intein) from a protein during protein maturation; the excision of inteins is precise and the N- and C-terminal exteins are joined by a normal peptide bond. Protein splicing involves 4 nucleophilic displacements by the 3 conserved splice junction residues. 0.0096428
showSigOfNodes(GOdataVar.BP, score(BPvar.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 327 
## Number of Edges = 619 
## 
## $complete.dag
## [1] "A graph with 327 nodes."

Below I plot variance portion, but for the termo found most significant when using p-values, for comparison.

showGroupDensity(GOdataVar.BP, names(score(BP.weight01))[which.min(score(BP.weight01))])

Molecular function

orderedTerms <- names(sort(score(MFvar.weight01)))
significant.weight01 <- score(MFvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MFvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MFvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.variance.sigTerms <- sigTerms

MFvar.all <- GenTable(GOdataVar.MF, elim=MFvar.elim, weight01=MFvar.weight01, lea=MFvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   MFvar.all[MFvar.all$Significant > MFvar.all$Expected,],
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0005515 protein binding 2101 244 209.27 1 1.8e-21 3.9e-24 2.0e-22
2 GO:0005509 calcium ion binding 361 69 35.96 2 2.9e-18 2.9e-18 2.9e-18
3 GO:0003774 motor activity 131 21 13.05 3 3.5e-07 3.5e-07 5.1e-07
4 GO:0004930 G protein-coupled receptor activity 180 21 17.93 4 7.2e-06 2.1e-05 2.0e-06
5 GO:0005525 GTP binding 198 26 19.72 5 2.8e-05 2.8e-05 2.8e-05
6 GO:0004672 protein kinase activity 274 37 27.29 7 0.00012 4.0e-05 0.00012
7 GO:0005230 extracellular ligand-gated ion channel a… 54 10 5.38 8 0.00018 4.5e-05 0.00018
8 GO:0022857 transmembrane transporter activity 583 94 58.07 29 0.00413 0.00015 2.8e-14
9 GO:0005524 ATP binding 786 80 78.29 9 0.00020 0.00020 0.00020
10 GO:0004888 transmembrane signaling receptor activit… 251 33 25.00 25 0.00236 0.00021 1.3e-07
11 GO:0004181 metallocarboxypeptidase activity 19 7 1.89 11 0.00032 0.00032 0.00032
12 GO:0005507 copper ion binding 18 5 1.79 13 0.00068 0.00068 0.00068
13 GO:0004252 serine-type endopeptidase activity 88 22 8.77 16 0.00141 0.00141 0.00141
14 GO:0005249 voltage-gated potassium channel activity 27 4 2.69 17 0.00142 0.00142 0.00142
15 GO:0042626 ATPase activity, coupled to transmembran… 86 14 8.57 32 0.00474 0.00183 0.00474
16 GO:0061630 ubiquitin protein ligase activity 27 3 2.69 21 0.00211 0.00211 0.00211
18 GO:0005200 structural constituent of cytoskeleton 17 8 1.69 23 0.00216 0.00216 0.00216
19 GO:0003924 GTPase activity 138 19 13.75 24 0.00233 0.00233 0.00233
20 GO:0030246 carbohydrate binding 51 13 5.08 39 0.00624 0.00262 0.00031
21 GO:0008061 chitin binding 77 12 7.67 26 0.00273 0.00273 0.00273
22 GO:0036459 thiol-dependent ubiquitinyl hydrolase ac… 43 6 4.28 27 0.00293 0.00370 0.00293
23 GO:0003777 microtubule motor activity 98 10 9.76 28 0.00410 0.00410 0.00410
25 GO:0008047 enzyme activator activity 51 6 5.08 60 0.01276 0.00434 0.00239
26 GO:0016840 carbon-nitrogen lyase activity 8 1 0.80 31 0.00461 0.00461 0.00461
27 GO:0004550 nucleoside diphosphate kinase activity 5 1 0.50 33 0.00488 0.00488 0.00488
28 GO:0005337 nucleoside transmembrane transporter act… 7 2 0.70 34 0.00491 0.00491 0.00491
29 GO:0047631 ADP-ribose diphosphatase activity 6 2 0.60 35 0.00568 0.00568 0.00568
30 GO:0004983 neuropeptide Y receptor activity 9 1 0.90 38 0.00620 0.00620 0.00620
31 GO:0051015 actin filament binding 30 5 2.99 40 0.00624 0.00624 0.00624
32 GO:0070577 lysine-acetylated histone binding 5 1 0.50 41 0.00687 0.00687 0.00687
33 GO:0016504 peptidase activator activity 5 1 0.50 42 0.00687 0.00687 0.00687
34 GO:0070628 proteasome binding 5 1 0.50 43 0.00687 0.00687 0.00687
35 GO:0070006 metalloaminopeptidase activity 5 2 0.50 46 0.00750 0.00750 0.00750
36 GO:0016787 hydrolase activity 1382 215 137.65 14 0.00117 0.00766 2.7e-17
37 GO:0016772 transferase activity, transferring phosp… 459 53 45.72 96 0.02951 0.00800 6.6e-06
38 GO:0005328 neurotransmitter:sodium symporter activi… 18 5 1.79 48 0.00861 0.00861 0.00861
39 GO:0004198 calcium-dependent cysteine-type endopept… 28 4 2.79 51 0.00901 0.00901 0.00901
41 GO:0005267 potassium channel activity 42 9 4.18 55 0.01126 0.01126 7.9e-05
43 GO:0015293 symporter activity 32 6 3.19 81 0.02507 0.01227 0.00184
44 GO:0003700 DNA-binding transcription factor activit… 158 17 15.74 95 0.02897 0.01307 0.02897
45 GO:0030248 cellulose binding 7 1 0.70 62 0.01386 0.01386 0.01386
46 GO:0004867 serine-type endopeptidase inhibitor acti… 9 2 0.90 63 0.01419 0.01419 0.01419
47 GO:0008092 cytoskeletal protein binding 144 19 14.34 71 0.01797 0.01517 0.00059
48 GO:0022848 acetylcholine-gated cation-selective cha… 16 2 1.59 65 0.01636 0.01636 0.01636
49 GO:0004096 catalase activity 15 2 1.49 70 0.01677 0.01677 0.01677
50 GO:0020037 heme binding 95 17 9.46 72 0.01859 0.01859 0.01859
52 GO:0015081 sodium ion transmembrane transporter act… 101 17 10.06 304 0.27297 0.02355 0.27297
53 GO:0015491 cation:cation antiporter activity 10 4 1.00 53 0.00920 0.02363 0.00920
kable(
   MFvar.all[MFvar.all$Significant < MFvar.all$Expected,],
   caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
17 GO:0043565 sequence-specific DNA binding 146 11 14.54 52 0.00907 0.00213 0.00907
24 GO:0003735 structural constituent of ribosome 113 2 11.26 30 0.00432 0.00432 0.00432
40 GO:0008017 microtubule binding 62 6 6.18 54 0.01005 0.01005 0.01005
42 GO:0004298 threonine-type endopeptidase activity 15 0 1.49 57 0.01164 0.01164 0.01164
51 GO:0003712 transcription coregulator activity 42 3 4.18 44 0.00730 0.02015 0.00730
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MFvar.weight01)[sigTerms]),
      caption = paste('Molecular functions terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular functions terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000000
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0000000
GO:0003774 motor activity Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. 0.0000004
GO:0004930 G protein-coupled receptor activity Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0000206
GO:0005525 GTP binding Interacting selectively and non-covalently with GTP, guanosine triphosphate. 0.0000283
GO:0004672 protein kinase activity Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. 0.0000399
GO:0005230 extracellular ligand-gated ion channel activity Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. 0.0000446
GO:0022857 transmembrane transporter activity Enables the transfer of a substance, usually a specific substance or a group of related substances, from one side of a membrane to the other. 0.0001541
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0001971
GO:0004888 transmembrane signaling receptor activity Combining with an extracellular or intracellular signal and transmitting the signal from one side of the membrane to the other to initiate a change in cell activity or state as part of signal transduction. 0.0002077
GO:0004181 metallocarboxypeptidase activity Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0003201
GO:0005507 copper ion binding Interacting selectively and non-covalently with copper (Cu) ions. 0.0006774
GO:0004252 serine-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). 0.0014114
GO:0005249 voltage-gated potassium channel activity Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. 0.0014195
GO:0042626 ATPase activity, coupled to transmembrane movement of substances Catalysis of the reaction: ATP + H2O = ADP + phosphate, to directly drive the active transport of a substance across a membrane. 0.0018293
GO:0061630 ubiquitin protein ligase activity Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. 0.0021109
GO:0043565 sequence-specific DNA binding Interacting selectively and non-covalently with DNA of a specific nucleotide composition, e.g. GC-rich DNA binding, or with a specific sequence motif or type of DNA e.g. promotor binding or rDNA binding. 0.0021341
GO:0005200 structural constituent of cytoskeleton The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. 0.0021609
GO:0003924 GTPase activity Catalysis of the reaction: GTP + H2O = GDP + phosphate. 0.0023264
GO:0030246 carbohydrate binding Interacting selectively and non-covalently with any carbohydrate, which includes monosaccharides, oligosaccharides and polysaccharides as well as substances derived from monosaccharides by reduction of the carbonyl group (alditols), by oxidation of one or more hydroxy groups to afford the corresponding aldehydes, ketones, or carboxylic acids, or by replacement of one or more hydroxy group(s) by a hydrogen atom. Cyclitols are generally not regarded as carbohydrates. 0.0026230
GO:0008061 chitin binding Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0027273
GO:0036459 thiol-dependent ubiquitinyl hydrolase activity Catalysis of the thiol-dependent hydrolysis of an ester, thioester, amide, peptide or isopeptide bond formed by the C-terminal glycine of ubiquitin. 0.0036965
GO:0003777 microtubule motor activity Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). 0.0041032
GO:0003735 structural constituent of ribosome The action of a molecule that contributes to the structural integrity of the ribosome. 0.0043247
GO:0016840 carbon-nitrogen lyase activity Catalysis of the release of ammonia or one of its derivatives, with the formation of a double bond or ring. Enzymes with this activity may catalyze the actual elimination of the ammonia, amine or amide, e.g. CH-CH(-NH-R) = C=CH- + NH2-R. Others, however, catalyze elimination of another component, e.g. water, which is followed by spontaneous reactions that lead to breakage of the C-N bond, e.g. L-serine ammonia-lyase (EC:4.3.1.17), so that the overall reaction is C(-OH)-CH(-NH2) = CH2-CO- + NH3, i.e. an elimination with rearrangement. The sub-subclasses of EC:4.3 are the ammonia-lyases (EC:4.3.1), lyases acting on amides, amidines, etc. (EC:4.3.2), the amine-lyases (EC:4.3.3), and other carbon-nitrogen lyases (EC:4.3.99). 0.0046139
GO:0004550 nucleoside diphosphate kinase activity Catalysis of the reaction: ATP + nucleoside diphosphate = ADP + nucleoside triphosphate. 0.0048829
GO:0005337 nucleoside transmembrane transporter activity Enables the transfer of a nucleoside, a nucleobase linked to either beta-D-ribofuranose (ribonucleoside) or 2-deoxy-beta-D-ribofuranose, (a deoxyribonucleotide) from one side of a membrane to the other. 0.0049056
GO:0047631 ADP-ribose diphosphatase activity Catalysis of the reaction: ADP-ribose + H2O = AMP + D-ribose 5-phosphate. 0.0056840
GO:0004983 neuropeptide Y receptor activity Combining with neuropeptide Y to initiate a change in cell activity. 0.0061951
GO:0051015 actin filament binding Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. 0.0062409
GO:0070577 lysine-acetylated histone binding Interacting selectively and non-covalently with a histone in which a lysine residue has been modified by acetylation. 0.0068670
GO:0016504 peptidase activator activity Binds to and increases the activity of a peptidase, any enzyme that catalyzes the hydrolysis peptide bonds. 0.0068670
GO:0070628 proteasome binding Interacting selectively and non-covalently with a proteasome, a large multisubunit protein complex that catalyzes protein degradation. 0.0068670
GO:0070006 metalloaminopeptidase activity Catalysis of the hydrolysis of N-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0075027
GO:0016787 hydrolase activity Catalysis of the hydrolysis of various bonds, e.g. C-O, C-N, C-C, phosphoric anhydride bonds, etc. Hydrolase is the systematic name for any enzyme of EC class 3. 0.0076576
GO:0005328 neurotransmitter:sodium symporter activity Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: neurotransmitter(out) + Na+(out) = neurotransmitter(in) + Na+(in). 0.0086099
GO:0004198 calcium-dependent cysteine-type endopeptidase activity Catalysis of the hydrolysis of nonterminal peptide bonds in a polypeptide chain by a mechanism using a cysteine residue at the enzyme active center, and requiring the presence of calcium. 0.0090130
showSigOfNodes(GOdataVar.MF, score(MFvar.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 182 
## Number of Edges = 250 
## 
## $complete.dag
## [1] "A graph with 182 nodes."

I plot variance portion, but for the term found most significant when using p-values, for comparison.

showGroupDensity(GOdataVar.MF, names(score(MF.weight01))[which.min(score(MF.weight01))])

Cellular component

orderedTerms <- names(sort(score(CCvar.weight01)))
significant.weight01 <- score(CCvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CCvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CCvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
if (length(sigTerms) == 0) {
   sigTerms <- orderedTerms[significant.lea & significant.elim]
}
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.variance.sigTerms <- sigTerms

CCvar.all <- GenTable(GOdataVar.CC, elim=CCvar.elim, weight01=CCvar.weight01, lea=CCvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=max(sum(significant.elim), 10))

kable(
   CCvar.all[CCvar.all$Significant > CCvar.all$Expected,],
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0016021 integral component of membrane 885 133 80.29 1 5.7e-23 2.4e-21 5.9e-28
2 GO:0016020 membrane 1565 211 141.99 2 1.6e-10 3.7e-10 < 1e-30
3 GO:0016459 myosin complex 33 11 2.99 4 1.2e-07 1.2e-07 1.2e-07
4 GO:0005576 extracellular region 140 25 12.70 3 7.7e-08 2.7e-06 7.7e-08
5 GO:0008076 voltage-gated potassium channel complex 13 2 1.18 5 6.6e-05 6.6e-05 6.6e-05
6 GO:0005887 integral component of plasma membrane 118 19 10.71 6 1.0e-04 0.00023 6.8e-07
8 GO:0005886 plasma membrane 170 26 15.42 8 0.00022 0.00059 5.7e-11
10 GO:0090575 RNA polymerase II transcription factor c… 23 6 2.09 7 0.00020 0.00258 0.00020
11 GO:0098797 plasma membrane protein complex 34 5 3.08 30 0.01397 0.00313 5.9e-06
12 GO:0005929 cilium 27 7 2.45 11 0.00072 0.00380 1.5e-05
14 GO:0034704 calcium channel complex 5 1 0.45 21 0.00622 0.00622 0.00622
15 GO:0045211 postsynaptic membrane 16 2 1.45 28 0.00906 0.00906 0.00906
17 GO:0005743 mitochondrial inner membrane 33 5 2.99 16 0.00153 0.01470 0.00153
19 GO:0031012 extracellular matrix 9 2 0.82 37 0.02018 0.02018 0.02018
20 GO:0008023 transcription elongation factor complex 10 2 0.91 38 0.02241 0.02241 0.02241
21 GO:0005874 microtubule 29 7 2.63 42 0.02318 0.02318 0.02318
22 GO:0016591 RNA polymerase II, holoenzyme 20 3 1.81 17 0.00265 0.02524 0.00265
23 GO:0044441 ciliary part 21 5 1.91 39 0.02247 0.02644 0.00047
24 GO:0005858 axonemal dynein complex 5 1 0.45 47 0.02762 0.02762 0.02762
26 GO:0099512 supramolecular fiber 31 9 2.81 27 0.00857 0.03429 0.00857
28 GO:0005669 transcription factor TFIID complex 7 1 0.64 52 0.03784 0.03784 0.03784
kable(
   CCvar.all[CCvar.all$Significant < CCvar.all$Expected,],
   caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
7 GO:0005840 ribosome 115 3 10.43 12 0.00083 0.00038 0.00083
9 GO:0005681 spliceosomal complex 12 1 1.09 15 0.00145 0.00145 0.00145
13 GO:0005634 nucleus 531 48 48.18 13 0.00102 0.00382 2.0e-07
16 GO:0098800 inner mitochondrial membrane protein com… 21 1 1.91 56 0.03838 0.00996 0.03838
18 GO:0016592 mediator complex 24 1 2.18 35 0.01870 0.01870 0.01870
25 GO:0000124 SAGA complex 5 0 0.45 49 0.03003 0.03003 0.03003
27 GO:0031461 cullin-RING ubiquitin ligase complex 15 0 1.36 93 0.10223 0.03779 0.10223
29 GO:0005802 trans-Golgi network 6 0 0.54 57 0.03860 0.03860 0.03860
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CCvar.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0016021 integral component of membrane The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000000
GO:0016020 membrane A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. 0.0000000
GO:0016459 myosin complex A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. 0.0000001
GO:0005576 extracellular region The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. 0.0000027
GO:0008076 voltage-gated potassium channel complex A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. 0.0000665
GO:0005887 integral component of plasma membrane The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0002347
GO:0005840 ribosome An intracellular organelle, about 200 A in diameter, consisting of RNA and protein. It is the site of protein biosynthesis resulting from translation of messenger RNA (mRNA). It consists of two subunits, one large and one small, each containing only protein and RNA. Both the ribosome and its subunits are characterized by their sedimentation coefficients, expressed in Svedberg units (symbol: S). Hence, the prokaryotic ribosome (70S) comprises a large (50S) subunit and a small (30S) subunit, while the eukaryotic ribosome (80S) comprises a large (60S) subunit and a small (40S) subunit. Two sites on the ribosomal large subunit are involved in translation, namely the aminoacyl site (A site) and peptidyl site (P site). Ribosomes from prokaryotes, eukaryotes, mitochondria, and chloroplasts have characteristically distinct ribosomal proteins. 0.0003808
GO:0005886 plasma membrane The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. 0.0005897
GO:0005681 spliceosomal complex Any of a series of ribonucleoprotein complexes that contain snRNA(s) and small nuclear ribonucleoproteins (snRNPs), and are formed sequentially during the spliceosomal splicing of one or more substrate RNAs, and which also contain the RNA substrate(s) from the initial target RNAs of splicing, the splicing intermediate RNA(s), to the final RNA products. During cis-splicing, the initial target RNA is a single, contiguous RNA transcript, whether mRNA, snoRNA, etc., and the released products are a spliced RNA and an excised intron, generally as a lariat structure. During trans-splicing, there are two initial substrate RNAs, the spliced leader RNA and a pre-mRNA. 0.0014522
GO:0090575 RNA polymerase II transcription factor complex A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. 0.0025830
GO:0005929 cilium A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. 0.0037999
GO:0005634 nucleus A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. 0.0038217
GO:0034704 calcium channel complex An ion channel complex through which calcium ions pass. 0.0062205
GO:0045211 postsynaptic membrane A specialized area of membrane facing the presynaptic membrane on the tip of the nerve ending and separated from it by a minute cleft (the synaptic cleft). Neurotransmitters cross the synaptic cleft and transmit the signal to the postsynaptic membrane. 0.0090582
showSigOfNodes(GOdataVar.CC, score(CCvar.weight01),
               firstSigNodes = sum(significant.weight01),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 68 
## Number of Edges = 123 
## 
## $complete.dag
## [1] "A graph with 68 nodes."

For comparison, I plot the distribution of variance portion for the CC term found most significant when using p-values.

showGroupDensity(GOdataVar.CC, names(score(CC.weight01))[which.min(score(CC.weight01))])

Comparison between the two ordering of genes.

Biological process

allTerms <- usedGO(GOdata.BP)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% BP.pvalue.sigTerms,
                             Variance = allTerms %in% BP.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(BP.weight01))[allTerms],
                         Variance = rank(score(BPvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth() + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of BP terms by significance')

Molecular function

allTerms <- usedGO(GOdata.MF)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% MF.pvalue.sigTerms,
                             Variance = allTerms %in% MF.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(MF.weight01))[allTerms],
                         Variance = rank(score(MFvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth() + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of MF terms by significance')

Cellular component

allTerms <- usedGO(GOdata.CC)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% CC.pvalue.sigTerms,
                             Variance = allTerms %in% CC.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(CC.weight01))[allTerms],
                         Variance = rank(score(CCvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth() + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of CC terms by significance')

Session info

I save the main variables to be able to load them in a new R session later.

save(allgenes2GO,
     GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
     GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
     GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
     GOdataVar.BP, BPvar.elim, BPvar.weight01, BPvar.lea, BP.variance.sigTerms,
     GOdataVar.MF, MFvar.elim, MFvar.weight01, MFvar.lea, MF.variance.sigTerms,
     GOdataVar.CC, CCvar.elim, CCvar.weight01, CCvar.lea, CC.variance.sigTerms,
     file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] Rgraphviz_2.30.0     ggplot2_3.2.1        limma_3.42.0        
##  [4] knitr_1.27           topGO_2.38.1         SparseM_1.78        
##  [7] GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2      
## [10] S4Vectors_0.24.3     Biobase_2.46.0       graph_1.64.0        
## [13] BiocGenerics_0.32.0 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5   xfun_0.12          purrr_0.3.3        splines_3.6.2     
##  [5] lattice_0.20-38    colorspace_1.4-1   vctrs_0.2.1        htmltools_0.4.0   
##  [9] yaml_2.2.0         mgcv_1.8-31        blob_1.2.1         rlang_0.4.2       
## [13] pillar_1.4.3       glue_1.3.1         withr_2.1.2        DBI_1.1.0         
## [17] bit64_0.9-7        matrixStats_0.55.0 lifecycle_0.1.0    stringr_1.4.0     
## [21] munsell_0.5.0      gtable_0.3.0       memoise_1.1.0      evaluate_0.14     
## [25] labeling_0.3       highr_0.8          Rcpp_1.0.3         backports_1.1.5   
## [29] scales_1.1.0       farver_2.0.3       bit_1.1-15.1       digest_0.6.23     
## [33] stringi_1.4.5      dplyr_0.8.3        tools_3.6.2        magrittr_1.5      
## [37] lazyeval_0.2.2     tibble_2.1.3       RSQLite_2.2.0      crayon_1.3.4      
## [41] pkgconfig_2.0.3    zeallot_0.1.0      Matrix_1.2-18      assertthat_0.2.1  
## [45] rmarkdown_2.1      R6_2.4.1           nlme_3.1-143       compiler_3.6.2